Plotting codons used in each strain

Here we go reading up all my tables and merging ’em together. I haven’t incorporated read() or the join function yet, but its faster cause smaller files:

PB1 <- merge_all(list(read.csv(file='H2N2PB1.tidy.csv', head=TRUE, sep='\t'), 
                      read.csv(file='H1N1PB1.tidy.csv', head=TRUE, sep='\t'), 
                      read.csv(file='H3N2PB1.tidy.csv', head=TRUE, sep='\t')), 
                 all.x=TRUE, all.y=TRUE)
codonvalues <- read.csv(file='codonvalues.csv', head=TRUE, sep='\t')

Now I gotta table with: strain, sero, year, coryear, codon, numcodon, numaa. I also have another table holding a bunch of values associated with each codon.

Now I can plot!

Number of each codon found in strain

ggplot(PB1, aes(x=coryear, y=(numcodon), color=sero)) + 
  geom_point() + 
  geom_smooth(method='lm', se=FALSE) +
  background_grid(major = "xy", minor = "none") + 
  facet_wrap(~codon)

Proportion of each codon used for its amino acid in strain

ggplot(PB1, aes(x=coryear, y=(numcodon/numaa), color=sero)) + 
  geom_point() + 
  geom_smooth(method='lm', se=FALSE) +
  background_grid(major = "xy", minor = "none") + 
  facet_wrap(~codon)
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).

Pulling out linear regression

PB1 %>% group_by(sero, codon) %>% do(lm.result = lm((numcodon) ~ coryear, data=.)) -> fitted.models
intslope <- do(fitted.models, data.frame(sero=.$sero,codon=.$cod, 
                             intercept=coef(.$lm.result)[1], 
                             slope=coef(.$lm.result)[2]))
rm(fitted.models)
intslope
## Source: local data frame [192 x 4]
## Groups: <by row>
## 
##    sero codon intercept        slope
## 1  H2N2   AAA 21.514716  0.217697346
## 2  H2N2   AAC 17.894182 -0.056736985
## 3  H2N2   AAG 30.546019 -0.262504253
## 4  H2N2   AAT 30.045509  0.259931099
## 5  H2N2   ACA 38.227714 -0.249489622
## 6  H2N2   ACC 10.034791  0.293573494
## 7  H2N2   ACG  2.964274  0.007974651
## 8  H2N2   ACT  8.976438  0.025348758
## 9  H2N2   AGA 28.184161  0.019160429
## 10 H2N2   AGC 13.557843  0.103160088
## ..  ...   ...       ...          ...

Nice! slopes and intercetps GOT! Next up: p values

Note by Claus: You should be able to obtain p values directly from the linear regression using summary(.$lm.result)$coef[4,2] in the above code.

PB1 %>% group_by(sero, codon) %>%
  do(cor.res = cor.test(.$numcodon, .$coryear)) -> cors
## Warning in cor(x, y): the standard deviation is zero
## Warning in cor(x, y): the standard deviation is zero
## Warning in cor(x, y): the standard deviation is zero
## Warning in cor(x, y): the standard deviation is zero
corp <- do(cors, data.frame(sero=.$sero, codon=.$codon, cor=.$cor.res$estimate, p=.$cor.res$p.value))
rm(cors)
corp
## Source: local data frame [192 x 4]
## Groups: <by row>
## 
##    sero codon        cor            p
## 1  H2N2   AAA  0.7683672 0.000000e+00
## 2  H2N2   AAC -0.3202454 1.161447e-03
## 3  H2N2   AAG -0.7723004 5.053832e-21
## 4  H2N2   AAT  0.7522338 0.000000e+00
## 5  H2N2   ACA -0.8025226 1.039683e-23
## 6  H2N2   ACC  0.8240146 0.000000e+00
## 7  H2N2   ACG  0.1093708 2.787148e-01
## 8  H2N2   ACT  0.2918470 3.217027e-03
## 9  H2N2   AGA  0.1178618 2.428569e-01
## 10 H2N2   AGC  0.5003398 1.153826e-07
## ..  ...   ...        ...          ...

I mostly just copy-pasted the stats stuff. I think I understand them, but I’ll have to mess around with them a bit more.

Anyways, now ima dump that into a table with CAI and rtAI. I’m also including a table that shows the change in codon usage between my PB1 construct and the WT Udorn PB1.

codonstats <- select(corp, sero, codon, p) %>% 
  left_join(select(intslope, codon, sero, slope)) %>% 
  left_join(select(codonvalues, codon, withIFN, noIFN, CAI)) %>% 
  mutate(rtAI = withIFN/noIFN) %>% 
  left_join(read.csv(file='construct_v_WT_change.csv', head=TRUE, sep='\t'))
## Joining by: c("sero", "codon")
## Joining by: "codon"
## Joining by: "codon"
## Warning in left_join_impl(x, y, by$x, by$y): joining factors with different
## levels, coercing to character vector

Here are plots comparing slope, rtAI, and CAI. I’m expecting positive slopes for rtAI vs slope, negative for the rest I guess.

ggplot(codonstats, aes(x=rtAI, y=slope, label=codon)) + 
  geom_point() + 
  background_grid(major = "xy", minor = "none") + 
  geom_text(size = 5) + 
  facet_wrap(~sero)

ggplot(codonstats, aes(x=CAI, y=slope, label=codon)) + 
  geom_point() + 
  background_grid(major = "xy", minor = "none") + 
  geom_text(size = 5) + 
  facet_wrap(~sero)

ggplot(codonstats, aes(x=rtAI, y=CAI, label=codon)) + 
  geom_point() + 
  background_grid(major = "xy", minor = "none") + 
  geom_text(size = 5) + 
  facet_wrap(~sero)

Here are plots comparing the change between my older CAI construct and the WT with the rtAI, CAI and slope.

codonstats %>% filter(sero == "H3N2") %>% 
  ggplot(aes(x=change_con, y=rtAI, label=codon)) + 
  geom_point() + 
  background_grid(major = "xy", minor = "none") + 
  geom_text(size = 5)
## Warning: Removed 37 rows containing missing values (geom_point).
## Warning: Removed 37 rows containing missing values (geom_text).

codonstats %>% filter(sero == "H3N2") %>% 
  ggplot(aes(x=change_con, y=CAI, label=codon)) + 
  geom_point() + 
  background_grid(major = "xy", minor = "none") + 
  geom_text(size = 5)
## Warning: Removed 37 rows containing missing values (geom_point).
## Warning: Removed 37 rows containing missing values (geom_text).

codonstats %>% filter(sero == "H3N2") %>% 
  ggplot(aes(x=change_con, y=slope, label=codon)) + 
  geom_point() + 
  background_grid(major = "xy", minor = "none") + 
  geom_text(size = 5)
## Warning: Removed 37 rows containing missing values (geom_point).
## Warning: Removed 37 rows containing missing values (geom_text).

Here are plots comparing the change between my construct exactly replicating newer H3N2 codon usage and the WT with the rtAI, CAI and slope.

codonstats %>% filter(sero == "H3N2") %>% 
  ggplot(aes(x=change_new, y=rtAI, label=codon)) + 
  geom_point() + 
  background_grid(major = "xy", minor = "none") + 
  geom_text(size = 5)
## Warning: Removed 37 rows containing missing values (geom_point).
## Warning: Removed 37 rows containing missing values (geom_text).

codonstats %>% filter(sero == "H3N2") %>% 
  ggplot(aes(x=change_new, y=CAI, label=codon)) + 
  geom_point() + 
  background_grid(major = "xy", minor = "none") + 
  geom_text(size = 5)
## Warning: Removed 37 rows containing missing values (geom_point).
## Warning: Removed 37 rows containing missing values (geom_text).

codonstats %>% filter(sero == "H3N2") %>% 
  ggplot(aes(x=change_new, y=slope, label=codon)) + 
  geom_point() + 
  background_grid(major = "xy", minor = "none") + 
  geom_text(size = 5)
## Warning: Removed 37 rows containing missing values (geom_point).
## Warning: Removed 37 rows containing missing values (geom_text).

Comparing the change in codon usage from WT for both constructs.

codonstats %>% filter(sero == "H3N2") %>% 
  ggplot(aes(x=change_new, y=change_con, label=codon)) + 
  geom_point() + 
  background_grid(major = "xy", minor = "none") + 
  geom_text(size = 5)
## Warning: Removed 37 rows containing missing values (geom_point).
## Warning: Removed 37 rows containing missing values (geom_text).

I’m really not sure how to interpret this, though…